78 research outputs found

    Dynamic Appearance: A Video Representation for Action Recognition with Joint Training

    Full text link
    Static appearance of video may impede the ability of a deep neural network to learn motion-relevant features in video action recognition. In this paper, we introduce a new concept, Dynamic Appearance (DA), summarizing the appearance information relating to movement in a video while filtering out the static information considered unrelated to motion. We consider distilling the dynamic appearance from raw video data as a means of efficient video understanding. To this end, we propose the Pixel-Wise Temporal Projection (PWTP), which projects the static appearance of a video into a subspace within its original vector space, while the dynamic appearance is encoded in the projection residual describing a special motion pattern. Moreover, we integrate the PWTP module with a CNN or Transformer into an end-to-end training framework, which is optimized by utilizing multi-objective optimization algorithms. We provide extensive experimental results on four action recognition benchmarks: Kinetics400, Something-Something V1, UCF101 and HMDB51

    Learning an evolved mixture model for task-free continual learning

    Get PDF
    Recently, continual learning (CL) has gained significant interest because it enables deep learning models to acquire new knowledge without forgetting previously learnt information. However, most existing works require knowing the task identities and boundaries, which is not realistic in a real context. In this paper, we address a more challenging and realistic setting in CL, namely the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information. To address TFCL, we introduce an evolved mixture model whose network architecture is dynamically expanded to adapt to the data distribution shift. We implement this expansion mechanism by evaluating the probability distance between the knowledge stored in each mixture model component and the current memory buffer using the Hilbert Schmidt Independence Criterion (HSIC). We further introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload while preserving memory diversity. Empirical results demonstrate that the proposed approach achieves excellent performance.Comment: Accepted by the 29th IEEE International Conference on Image Processing (ICIP 2022

    Binary morphological shape-based interpolation applied to 3-D tooth reconstruction

    Get PDF
    In this paper we propose an interpolation algorithm using a mathematical morphology morphing approach. The aim of this algorithm is to reconstruct the nn-dimensional object from a group of (n-1)-dimensional sets representing sections of that object. The morphing transformation modifies pairs of consecutive sets such that they approach in shape and size. The interpolated set is achieved when the two consecutive sets are made idempotent by the morphing transformation. We prove the convergence of the morphological morphing. The entire object is modeled by successively interpolating a certain number of intermediary sets between each two consecutive given sets. We apply the interpolation algorithm for 3-D tooth reconstruction

    Masked Image Residual Learning for Scaling Deeper Vision Transformers

    Full text link
    Deeper Vision Transformers (ViTs) are more challenging to train. We expose a degradation problem in deeper layers of ViT when using masked image modeling (MIM) for pre-training. To ease the training of deeper ViTs, we introduce a self-supervised learning framework called Masked Image Residual Learning (MIRL), which significantly alleviates the degradation problem, making scaling ViT along depth a promising direction for performance upgrade. We reformulate the pre-training objective for deeper layers of ViT as learning to recover the residual of the masked image. We provide extensive empirical evidence showing that deeper ViTs can be effectively optimized using MIRL and easily gain accuracy from increased depth. With the same level of computational complexity as ViT-Base and ViT-Large, we instantiate 4.5×\times and 2×\times deeper ViTs, dubbed ViT-S-54 and ViT-B-48. The deeper ViT-S-54, costing 3×\times less than ViT-Large, achieves performance on par with ViT-Large. ViT-B-48 achieves 86.2% top-1 accuracy on ImageNet. On one hand, deeper ViTs pre-trained with MIRL exhibit excellent generalization capabilities on downstream tasks, such as object detection and semantic segmentation. On the other hand, MIRL demonstrates high pre-training efficiency. With less pre-training time, MIRL yields competitive performance compared to other approaches

    Defining Image Memorability using the Visual Memory Schema

    Get PDF
    Memorability of an image is a characteristic determined by the human observers’ ability to remember images they have seen. Yet recent work on image memorability defines it as an intrinsic property that can be obtained independent of the observer. The current study aims to enhance our understanding and prediction of image memorability, improving upon existing approaches by incorporating the properties of cumulative human annotations. We propose a new concept called the Visual Memory Schema (VMS) referring to an organization of image components human observers share when encoding and recognizing images. The concept of VMS is operationalised by asking human observers to define memorable regions of images they were asked to remember during an episodic memory test. We then statistically assess the consistency of VMSs across observers for either correctly or incorrectly recognised images. The associations of the VMSs with eye fixations and saliency are analysed separately as well. Lastly, we adapt various deep learning architectures for the reconstruction and prediction of memorable regions in images and analyse the results when using transfer learning at the outputs of different convolutional network layers

    Steganalysis of 3D objects using statistics of local feature sets

    Get PDF
    3D steganalysis aims to identify subtle invisible changes produced in graphical objects through digital watermarking or steganography. Sets of statistical representations of 3D features, extracted from both cover and stego 3D mesh objects, are used as inputs into machine learning classifiers in order to decide whether any information was hidden in the given graphical object. The features proposed in this paper include those representing the local object curvature, vertex normals, the local geometry representation in the spherical coordinate system. The effectiveness of these features is tested in various combinations with other features used for 3D steganalysis. The relevance of each feature for 3D steganalysis is assessed using the Pearson correlation coefficient. Six different 3D watermarking and steganographic methods are used for creating the stego-objects used in the evaluation study

    Enhancing reliability and efficiency for real-time robust adaptive steganography using cyclic redundancy check codes

    Get PDF
    The development of multimedia and deep learning technology bring new challenges to steganography and steganalysis techniques. Meanwhile, robust steganography, as a class of new techniques aiming to solve the problem of covert communication under lossy channels, has become a new research hotspot in the field of information hiding. To improve the communication reliability and efficiency for current real-time robust steganography methods, a concatenated code, composed of Syndrome–Trellis codes (STC) and cyclic redundancy check (CRC) codes, is proposed in this paper. The enhanced robust adaptive steganography framework proposed is this paper is characterized by a strong error detection capability, high coding efficiency, and low embedding costs. On this basis, three adaptive steganographic methods resisting JPEG compression and detection are proposed. Then, the fault tolerance of the proposed steganography methods is analyzed using the residual model of JPEG compression, thus obtaining the appropriate coding parameters. Experimental results show that the proposed methods have a significantly stronger robustness against compression, and are more difficult to be detected by statistical based steganalytic methods

    watermarking of 3D shapes using localized constraints

    No full text
    This paper develops a digital watermarking methodology for 3-D graphical objects defined by polygonal meshes. In watermarking or fingerprinting the aim is to embed a code in a given media without producing identifiable changes to it. One should be able to retrieve the embedded information even after the shape had suffered various modifications. Two blind watermarking techniques applying perturbations onto the local geometry for selected vertices are described in this paper. The proposed methods produce localized changes of vertex locations that do not alter the mesh topology. A study of the effects caused by vertex location modification is provided for a general class of surfaces. The robustness of the proposed algorithms is tested at noise perturbation and object cropping.
    corecore